Skip to content

HTML API: Handle \f in skip_script_data tag matching #9402

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

sirreal
Copy link
Member

@sirreal sirreal commented Aug 7, 2025

Trac ticket: https://core.trac.wordpress.org/ticket/63738

Address a minor HTML API mis-parse of script contents, where \f is not recognized as a valid trailing/termination character of SCRIPT tag names.


In this case, the \f form feed should be recognized as the end of the script tag name and close the script:

<script></script>🎉

before / after


This is another example of the same issue. Again, \f form feed should be recognized as the end of the script tag name and enter the double-escaped state (this script tag is not closed and consumes the rest of the document):

<script><!--<script␌</script>🎉

In this case <script< is correctly recognized as not a sequence that should transition from escaped to double-escaped, however it incorrectly advances beyond the following < character that starts the script close tag and does not close correctly at </script>.

before / after


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

Copy link

github-actions bot commented Aug 7, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@sirreal sirreal requested review from dmsnell and Copilot August 7, 2025 14:09
Copilot

This comment was marked as outdated.

Comment on lines +1629 to +1635
'>' !== $c &&
' ' !== $c &&
"\n" !== $c &&
'/' !== $c &&
"\t" !== $c &&
"\f" !== $c &&
"\r" !== $c
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ordered these by how often I'd expect the character to be seen in this position. I don't expect any real performance improvements from that part of the change, but also don't see any down side to having > and appear as the first and second match opportunities.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is that these are all likely to be performed in parallel and executed before the CPU even reaches these lines, so yeah, I would guess this is true, but without measurement also lean on not knowing. shouldn’t matter in any case, and unless someone has realistic benchmarks on realistic data, I would be skeptical of any performance claims on the position of these items.

Copy link

github-actions bot commented Aug 7, 2025

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

@@ -2012,6 +2012,7 @@ public function test_script_tag_parsing( string $input, bool $closes ) {
public static function data_script_tag(): array {
return array(
'Basic script tag' => array( '<script></script>', true ),
'Basic script tag with </script\f> close' => array( "<script></script\f>", true ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that we’re testing a class of terminations here we could extend these to add all of the relevant characters. for now I think this patch is great to go anyway, but I do think we would have some valuable work for someone to refactor some of the existing tests from the original build of the Tag Processor.

there’s probably something to be said about recreating the state machine from the spec and testing each of its branches.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It beguiles me how we missed this earlier on since that is so clear in the spec.

Before merging, would you be interested in two changes:

  • Move the comment above the if structure so it’s not awkwardly inside the condition? If you prefer it stay I trust your thoughts, but I think it could make this condition stand out more cleanly if we had an empty line above and below it and the comment explaining this.
  • Add a second * to make it a PHPDoc comment so that the @see links integrate with IDEs more smoothly, and remove the leading - so that they stand on their own. Otherwise, if we keep them as list items, {@see https://…} might be more appropriate.

The comment is verbose and I think it doesn’t need as much explanation, or even a single link to the SCRIPT parsing suffices, but being verbose is a decent default for comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants